Overview

Dataset statistics

Number of variables9
Number of observations440
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory31.1 KiB
Average record size in memory72.3 B

Variable types

NUM7
CAT2

Warnings

Detergents_Paper is highly correlated with GroceryHigh correlation
Grocery is highly correlated with Detergents_PaperHigh correlation
Buyer/Spender has unique values Unique

Reproduction

Analysis started2020-09-11 02:27:26.937804
Analysis finished2020-09-11 02:27:46.132400
Duration19.19 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Buyer/Spender
Real number (ℝ≥0)

UNIQUE

Distinct440
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.5
Minimum1
Maximum440
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:46.357798image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile22.95
Q1110.75
median220.5
Q3330.25
95-th percentile418.05
Maximum440
Range439
Interquartile range (IQR)219.5

Descriptive statistics

Standard deviation127.1613149
Coefficient of variation (CV)0.5766953055
Kurtosis-1.2
Mean220.5
Median Absolute Deviation (MAD)110
Skewness0
Sum97020
Variance16170
MonotocityStrictly increasing
2020-09-11T07:57:46.675948image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
44010.2%
 
15110.2%
 
14010.2%
 
14110.2%
 
14210.2%
 
14310.2%
 
14410.2%
 
14510.2%
 
14610.2%
 
14710.2%
 
Other values (430)43097.7%
 
ValueCountFrequency (%) 
110.2%
 
210.2%
 
310.2%
 
410.2%
 
510.2%
 
ValueCountFrequency (%) 
44010.2%
 
43910.2%
 
43810.2%
 
43710.2%
 
43610.2%
 

Channel
Categorical

Distinct2
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Hotel
298 
Retail
142 
ValueCountFrequency (%) 
Hotel29867.7%
 
Retail14232.3%
 
2020-09-11T07:57:46.904336image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-11T07:57:47.085849image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:47.239456image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length5
Mean length5.322727273
Min length5

Region
Categorical

Distinct3
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Other
316 
Lisbon
77 
Oporto
47 
ValueCountFrequency (%) 
Other31671.8%
 
Lisbon7717.5%
 
Oporto4710.7%
 
2020-09-11T07:57:47.449880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-11T07:57:47.617429image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:47.776006image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length5
Mean length5.281818182
Min length5

Fresh
Real number (ℝ≥0)

Distinct433
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12000.29773
Minimum3
Maximum112151
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:48.031324image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile401.9
Q13127.75
median8504
Q316933.75
95-th percentile36818.5
Maximum112151
Range112148
Interquartile range (IQR)13806

Descriptive statistics

Standard deviation12647.32887
Coefficient of variation (CV)1.053917924
Kurtosis11.53640849
Mean12000.29773
Median Absolute Deviation (MAD)5919.5
Skewness2.561322752
Sum5280131
Variance159954927.4
MonotocityNot monotonic
2020-09-11T07:57:48.339498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
967020.5%
 
320.5%
 
804020.5%
 
51420.5%
 
1804420.5%
 
336620.5%
 
714920.5%
 
142010.2%
 
445610.2%
 
1313410.2%
 
Other values (423)42396.1%
 
ValueCountFrequency (%) 
320.5%
 
910.2%
 
1810.2%
 
2310.2%
 
3710.2%
 
ValueCountFrequency (%) 
11215110.2%
 
7623710.2%
 
6895110.2%
 
5615910.2%
 
5608310.2%
 

Milk
Real number (ℝ≥0)

Distinct421
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5796.265909
Minimum55
Maximum73498
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:48.569881image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum55
5-th percentile593.75
Q11533
median3627
Q37190.25
95-th percentile16843.4
Maximum73498
Range73443
Interquartile range (IQR)5657.25

Descriptive statistics

Standard deviation7380.377175
Coefficient of variation (CV)1.273298584
Kurtosis24.66939775
Mean5796.265909
Median Absolute Deviation (MAD)2460
Skewness4.053754849
Sum2550357
Variance54469967.24
MonotocityNot monotonic
2020-09-11T07:57:48.795279image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
189720.5%
 
513920.5%
 
65920.5%
 
82920.5%
 
94420.5%
 
288420.5%
 
388020.5%
 
103220.5%
 
57720.5%
 
319920.5%
 
Other values (411)42095.5%
 
ValueCountFrequency (%) 
5510.2%
 
11210.2%
 
13410.2%
 
20110.2%
 
25410.2%
 
ValueCountFrequency (%) 
7349810.2%
 
5425910.2%
 
4619710.2%
 
4395010.2%
 
3836910.2%
 

Grocery
Real number (ℝ≥0)

HIGH CORRELATION

Distinct430
Distinct (%)97.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7951.277273
Minimum3
Maximum92780
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:49.072538image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile851.45
Q12153
median4755.5
Q310655.75
95-th percentile24033.5
Maximum92780
Range92777
Interquartile range (IQR)8502.75

Descriptive statistics

Standard deviation9503.162829
Coefficient of variation (CV)1.195174373
Kurtosis20.91467039
Mean7951.277273
Median Absolute Deviation (MAD)3093.5
Skewness3.58742869
Sum3498562
Variance90310103.75
MonotocityNot monotonic
2020-09-11T07:57:49.256048image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
166420.5%
 
240520.5%
 
149320.5%
 
156320.5%
 
360020.5%
 
68320.5%
 
240620.5%
 
653620.5%
 
1039120.5%
 
206220.5%
 
Other values (420)42095.5%
 
ValueCountFrequency (%) 
310.2%
 
13710.2%
 
21810.2%
 
22310.2%
 
24510.2%
 
ValueCountFrequency (%) 
9278010.2%
 
6729810.2%
 
5959810.2%
 
5557110.2%
 
4582810.2%
 

Frozen
Real number (ℝ≥0)

Distinct426
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3071.931818
Minimum25
Maximum60869
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:49.491416image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum25
5-th percentile136.85
Q1742.25
median1526
Q33554.25
95-th percentile9930.75
Maximum60869
Range60844
Interquartile range (IQR)2812

Descriptive statistics

Standard deviation4854.673333
Coefficient of variation (CV)1.580332384
Kurtosis54.6892807
Mean3071.931818
Median Absolute Deviation (MAD)1084.5
Skewness5.907985692
Sum1351650
Variance23567853.17
MonotocityNot monotonic
2020-09-11T07:57:49.662957image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
74420.5%
 
77920.5%
 
161920.5%
 
36420.5%
 
84820.5%
 
432420.5%
 
93720.5%
 
83020.5%
 
254020.5%
 
40220.5%
 
Other values (416)42095.5%
 
ValueCountFrequency (%) 
2510.2%
 
3310.2%
 
3610.2%
 
3810.2%
 
4210.2%
 
ValueCountFrequency (%) 
6086910.2%
 
3653410.2%
 
3500910.2%
 
1871110.2%
 
1802810.2%
 

Detergents_Paper
Real number (ℝ≥0)

HIGH CORRELATION

Distinct417
Distinct (%)94.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2881.493182
Minimum3
Maximum40827
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:49.974127image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile63.7
Q1256.75
median816.5
Q33922
95-th percentile12043.2
Maximum40827
Range40824
Interquartile range (IQR)3665.25

Descriptive statistics

Standard deviation4767.854448
Coefficient of variation (CV)1.654647139
Kurtosis19.00946434
Mean2881.493182
Median Absolute Deviation (MAD)715.5
Skewness3.631850631
Sum1267857
Variance22732436.04
MonotocityNot monotonic
2020-09-11T07:57:50.185559image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
22720.5%
 
31120.5%
 
11820.5%
 
81120.5%
 
78820.5%
 
15320.5%
 
9620.5%
 
9320.5%
 
28420.5%
 
95520.5%
 
Other values (407)42095.5%
 
ValueCountFrequency (%) 
320.5%
 
510.2%
 
710.2%
 
910.2%
 
1010.2%
 
ValueCountFrequency (%) 
4082710.2%
 
3810210.2%
 
2670110.2%
 
2423110.2%
 
2417110.2%
 

Delicatessen
Real number (ℝ≥0)

Distinct403
Distinct (%)91.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1524.870455
Minimum3
Maximum47943
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2020-09-11T07:57:50.370065image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile63.95
Q1408.25
median965.5
Q31820.25
95-th percentile4485.4
Maximum47943
Range47940
Interquartile range (IQR)1412

Descriptive statistics

Standard deviation2820.105937
Coefficient of variation (CV)1.849406898
Kurtosis170.6949393
Mean1524.870455
Median Absolute Deviation (MAD)637.5
Skewness11.15158648
Sum670943
Variance7952997.498
MonotocityNot monotonic
2020-09-11T07:57:50.540620image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
83440.9%
 
340.9%
 
54830.7%
 
121530.7%
 
39530.7%
 
61030.7%
 
29020.5%
 
37920.5%
 
4620.5%
 
75020.5%
 
Other values (393)41293.6%
 
ValueCountFrequency (%) 
340.9%
 
710.2%
 
810.2%
 
1110.2%
 
1820.5%
 
ValueCountFrequency (%) 
4794310.2%
 
1652310.2%
 
1447210.2%
 
1435110.2%
 
855010.2%
 

Interactions

2020-09-11T07:57:36.961364image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:37.518316image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:37.655947image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:37.804550image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:37.927263image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.048895image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.179546image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.301222image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.419903image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.555540image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.700155image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.833797image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:38.962452image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:39.176878image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:39.319498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:39.458129image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:39.642634image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:39.834121image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.027611image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.224081image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.394226image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.519199image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.649266image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.765203image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.890174image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:40.999559image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:41.140116image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:41.283553image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:41.404230image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:41.629630image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:41.765266image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:41.916861image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:42.080425image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:42.259945image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:42.425500image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:42.551165image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:42.729689image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:42.941121image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:43.121638image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:43.311133image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:43.476689image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:43.648273image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:43.921499image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:44.162885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:44.293506image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:44.467085image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:44.602676image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:44.778208image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:44.952742image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-09-11T07:57:50.673291image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-11T07:57:50.873662image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-11T07:57:51.046199image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-11T07:57:51.232733image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-11T07:57:51.397291image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-11T07:57:45.385583image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-11T07:57:45.720687image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Buyer/SpenderChannelRegionFreshMilkGroceryFrozenDetergents_PaperDelicatessen
01RetailOther126699656756121426741338
12RetailOther705798109568176232931776
23RetailOther635388087684240535167844
34HotelOther132651196422164045071788
45RetailOther2261554107198391517775185
56RetailOther94138259512666617951451
67RetailOther12126319969754803140545
78RetailOther757949569426166933212566
89HotelOther5963364861924251716750
910RetailOther60061109318881115974252098

Last rows

Buyer/SpenderChannelRegionFreshMilkGroceryFrozenDetergents_PaperDelicatessen
430431HotelOther30974230164835752412080
431432HotelOther8533550651601348613771498
432433HotelOther21117116247542691328395
433434HotelOther19823218149315413561449
434435HotelOther16731392279946882371838
435436HotelOther297031205116027131351822204
436437HotelOther3922814317644510932346
437438RetailOther145311548830243437148411867
438439HotelOther102901981223210381682125
439440HotelOther2787169825106547752